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ABSTRACT 

In  this  paper,  we  present  a  new  algorithm  to  implement  the  homogenized  energy  hysteresis  model  with  thermal 
relaxation  for  both  ferroelectric  and  ferromagnetic  materials.  The  approach  conserves  most  of  the  accuracy  of 
the  original  algorithm,  but  enables  all  erfc  and  exp  functions  to  be  calculated  in  advance,  thereby  requiring 
that  only  basic  mathematical  operations  be  performed  in  real  time.  This  is  done  without  a  significant  increase 
in  memory  usage.  Using  this  approach,  execution  time  of  the  model  has  been  seen  to  improve  by  a  factor 
of  70  for  some  applications,  whereas  the  error  only  increases  by  five  ten  thousandths  (0.05%)  of  the  saturation 
polarization/magnetization.  The  model  with  negligible  relaxation  is  also  given,  as  it  is  used  to  illustrate  some 
optimizations.  Emphasis  is  placed  on  the  efficient  computation  of  these  models,  and  theoretical  development  is 
left  to  the  references. 
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1.  BACKGROUND 

The  homogenized  energy  framework  provides  a  powerful  and  flexible  mechanism  to  model  ferroelectric  and 
ferromagnetic  hysteresis.  In  this  framework,  Gibb’s  energy  minimization  is  considered  at  the  lattice  level,  using 
the  kernel 

P(E)  =  -  +  PRS ,  (1) 

r? 

where  E  is  the  electric  field,  P  is  the  average  polarization,  Pr  is  the  polarization  at  remanence,  rj  is  the  inverse 
susceptibility,  5  =  —1  for  negatively  oriented  dipoles,  and  <5  =  1  for  those  dipoles  with  positive  orientation.  For 
magnetic  materials,  the  kernel  is 

M(H)  =  ^  +  MrS,  (2) 

V 

where  H  is  the  magnetic  field,  M  is  the  magnetization,  Mr  is  the  magnetization  at  remanence,  and  /io  is  the 
permeability.  In  practice,  however,  r]  is  the  result  of  a  data  fit,  and  /i0  is  combined  with  77  when  performing 
the  parameter  estimation  for  magnetic  materials.  This  allows  the  same  model  to  be  used  for  both  classes  of 
materials.  For  simplicity,  this  paper  refers  solely  to  electric  fields  and  polarizations,  with  the  understanding  that 
everything  presented  here  applies  not  only  to  electric  but  also  to  magnetic  materials. 

To  account  for  material  nonhomogenuities  and  interactions  between  dipoles,  both  the  coercive  field  value  at 
which  d  changes  and  the  interaction  field  between  dipoles  are  assumed  to  be  manifestations  of  an  underlying 
distribution.  The  requirements  placed  on  these  distributions  are  minimal:  the  coercive  field  distribution  Ec  is 
strictly  non-nonnegative,  the  interactive  field  distribution  Ej  is  symmetric  about  0,  and  both  distributions  are 
bounded  by  a  decreasing  exponential.  Including  these  distributions  gives  the  model 

nO O  f>0 O 

P(E)=  /  /  vdEJviiE^PiE  +  EnEJdEtdE.,  (3) 
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#  Inputs:  E,  Ec,  wc,  Nc,  Ei,  wi,  Ni,  PR,  r),  8 

#  Output:  P 

For  k  —  0  . . .  length(£)  —  1 
P[k]  =  0 

For  i  =  0 . . .  Nc  —  1 


For  j  —  0  . . .  Ni  —  1 
If  Ei[j\  +  E[k]  +  Ec\i]d[i,j]  >0 
8[i,j]  =  1 


P[k]  =  P[k\  +  wc[i\wi\j]((Ei  [j]  +  E[k])/r]  +  Pr ) 
Else 

5[i,j]  =  -1 

P[k]  =  P[k]  +  wc[i\wi\j]((Ei\j]  +  E[k])/rj  -  Pr) 
End  If 
End  For 
End  For 
End  For 


Algorithm  1.  Algorithm  used  to  implement  the  homogenized  energy  model  with  negligible  thermal  relaxation, 


where  vc  and  i>j  represent  the  distributions  of  coercive  and  interaction  fields,  respectively.  For  computation,  the 
distribution  of  parameters  is  considered  at  a  predetermined  number  of  quadrature  points,  and  in  general  the 
only  restriction  on  the  quadrature  method  is  that  an  even  number  of  quadrature  points  are  needed  on  the  Ei 
axis.  This  assures  accurate  modelling  of  the  material  in  the  depoled  state.  Implementing  the  quadrature  method 
gives  the  model 

Nc  Ni 

P{E)  =  E  E  Wc[*K[?]P(£  +  Ei;  Ec),  (4) 

*= 1  3=1 

where  Nc  is  the  number  of  coercive  field  quadrature  points,  iVj  is  the  number  of  interaction  field  quadrature 
points,  wc  =  vcx  coercive  quadrature  weights ,  and  wi  is  defined  similarly  for  the  interaction  field.  The  [  ]  notation 
represents  array  indexing.  A  complete  discussion  of  the  development  of  this  model  can  be  found  in  [1,  3,  2,  4], 
and  a  pseudocode  implementation  is  given  in  Algorithm  1  and  Table  1. 

Algorithm  1  assumes  the  effects  of  thermal  relaxation  are  negligible.  Considering  a  material  at  absolute 
temperature  T  with  volume  V,  the  effect  of  thermal  relaxation  or  activation  depends  on  the  ratio  kT /V ,  where 
k  is  Boltzmann’s  constant.  When  this  value  is  very  small  (equivalently,  when  the  relative  lattice  volume  is 
sufficiently  large),  thermal  relaxation  is  negligible  and  the  previous  model  may  be  used.  When  this  is  not  the 


Name  Type 


E 

Vector 

Ec 

Vector 

wc 

Vector 

Nc 

Scalar  integer 

Ei 

Vector 

wi 

Vector 

Ni 

Scalar  integer 

Pr 

Scalar 

At 

Scalar 

V 

Scalar 

P 

Scalar 

T 

Scalar 

e 

Scalar 

f 

Scalar 

5 

Nc  x  Ni  matrix 

x+ 

Nc  x  Ni  matrix 

r 

Scalar  integer 

Description 

Input  electric  field  values  -  constant  sampling  rate  assumed 
Quadrature  points  for  coercive  field  distribution 
Quadrature  weights  times  distribution  levels  for  Ec  integration 
Number  of  coercive  field  quadrature  points/ weights 
Quadrature  points  for  interactive  field  distribution 
Quadrature  weights  times  distribution  levels  for  Ei  integration 
Number  of  interactive  field  quadrature  points/ weights 
Polarization  at  remanence  determined  from  parameter  estimation 
Time  between  successive  samples  of  E 
Determined  from  parameter  estimation 

Determined  from  parameter  estimation  -  Note  /3  =  \JlkT /V 
Determined  from  parameter  estimation 
Small  positive  constant,  on  order  of  1  x  10-3 

Extremely  small  positive  constant,  on  order  of  machine  accuracy  limits 
Initial  state  of  the  material  -  1  if  domain  is  positive;  -1  if  negative 
Initial  state  of  the  material  -  1  if  domain  is  positive;  0  if  negative 
Resolution  increase  for  shifting  operations 


Table  1.  Input  parameters  used  by  various  algorithms  in  this  paper.  Note:  some  algorithms  do  not  use  all  parameters. 
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case,  we  consider  the  Boltzmann  relation 


KG)  =  Cexp  (  G£)V)  -  (5) 

where  G  is  the  Gibb’s  energy.  The  local  average  polarization  is  given  by  replacing  P  in  [4]  with 

P(E)  =  x+P+  +  x_P_,  (6) 

where  x+  and  X-  are  the  fraction  of  moments  having  positive  and  negative  orientations,  respectively.  This  can 
be  simplified  slightly  by  realizing  that  x _  =  1  —  x+.  The  average  polarizations  are  given  by 

fOO  j-Pj 

P+  =  /  Pn{G{P))dP,  P-  =  /  P[i(G(P))dP,  (7) 

J  Pj  J—  OO 

where  Pi  is  the  inflection  point.  The  moment  fractions  evolve  according  to  the  differential  equation 


x+  =  -P+-X+  +  P-+X- ,  (8) 

where  p- \ and  p |_  represent  the  likelihood  of  dipoles  switching  from  positive  to  negative  and  from  negative  to 

positive,  respectively.  These  likelihoods  are  given  by 

exP(-G(P)V/(kT))dP  _  fZp'-g  exp(—G(P)V/ (kT))dP 

P+ ~  Tf™_€exp(-G(P)V/(kT))dP’  P~+  T  f~^~e  exp(—G(P)V/(kT))dP 

Details  regarding  the  development  of  these  equations  is  given  in  [4],  and  the  resulting  model  is  implemented  in 
Algorithm  2  with  parameters  as  described  in  Table  1.  Note  that  the  calculation  of  these  terms  involves  evaluating 
complementary  error  (erfc)  and  exponential  (exp)  functions.  This  significantly  reduces  the  computational 
efficiency  of  the  model  when  thermal  relaxation  is  included. 


#  Inputs:  E,  Ec,  wc,  Nc,  A/,  wi,  Nr,  PR,  77,  (3,  r, e,  /,  x+ 
Output:  P 

For  i  —  0  . . .  Nc  —  1 
PM  =  Pr  —  EM  In 
End  For 


For  k  =  0  . . .  length(E)  —  1 
P[k\  =  0 

For  i  =  0  ...  Nc  —  1 
For  j  =  0  . . .  Nj  —  1 
P^in  =  (E[k]+EI\j])/r1-PR 
P+in  =  ( E[k \  +  EM  ])/v  +  Pr 
If  EM  -  E[k\  -  EM]  >  0 
P-+  =  (erfc((P“in  +  PM  ~  z)/0)- 
erfc((P“in  +  Pi[i\  +  e)/P))/ 
(rerf  c((P“in  +  P/[i]  -  e)/0)  +  /) 

Else 


P-+  =  1/t 
End  If 


If  Ec[i\  +  E[k]  +  EM]  >  0 
P+-  =  (erf  c((P/[i]  -  e  -  P+in)//3)- 
erf c((P/[i]  +  e  -  P+in)//3))/ 
(rerfc((P/[iJ  -  e  -  P+in)//3)  +  /) 

Else 


P+-  =  1/t 


End  If 

x+[iJ\ 


x+[iJ]  +P~+Af 

1  +  A  t(p+-  +p-+) 


p_  _  /texp (-((-PiH  -  e  -  P~jn)/P)2)  |  p- 

Mterfc  m[i]  +  e  +  P-inm  +  f  ^ 

p  /3exp(-((PrH  +  ^PAj//I)2)  + 

+  ^erfc ((Pi[i\+e- P+in)/0)  +  f 
P[k]  =  P[k]  +  wc[i]wM](x+[i,  j]P+  +  (1  -  x+[i,j])P-) 


End  For 


End  For 


End  For 


Algorithm  2.  Algorithm  for  the  homogenized  energy  model  which  includes  thermal  relaxation. 
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2.  REPETITIVE  OPERATIONS 

In  a  first  step  to  improve  the  efficiency  of  the  model,  several  repetitive  operations  can  be  removed.  For  simplicity, 
consider  first  the  model  with  negligible  relaxation.  For  each  iteration  and  each  mesh  point, 

P[k]  =  P[k ]  +  wc[i\Wl[j]  ^]  +  m  ±  (10) 

must  be  calculated.  Note  that  Ej  does  not  change  through  the  course  of  the  algorithm,  and  that  dE  is  the  same 
for  all  mesh  points  in  a  given  temporal  iteration.  Thus,  each  point  in  Ej  may  be  divided  through  by  as  part 
of  algorithm  setup,  and  E[k]  divided  just  once  per  iteration  (this  division  will  later  be  removed).  This  process 
transforms  (10)  into 

P[k\  =  P[k\  +  wc[i]wI[j](EI\j]  +  E[k }  ±  PR) 

=  P[k]  +  wc[i]wi\j]Ei\j]  +  wc[i\wi[j}E[k\  ±  wc[i]wi[j]PR. 

Note  that  each  mesh  point  adds  three  terms  to  P[k}.  The  first,  u>c[i]w/[j]E/[j],  is  independent  of  the  input  field, 
and  a  scalar  value  for  YliYl  j  w  <\AW  i\j}E i[j]  may  be  calculated  and  stored  a  priori.  The  second  does  depend 
on  the  input  field,  but  only  as  a  scalar  multiple.  As  such,  this  term  can  be  simplified  to  one  multiplication 
per  temporal  iteration  by  calculating  the  scalar  term  JT  JN  wc[i]wi\j]  in  advance.  In  actuality,  we  compute 
(JT  wc[i\wi[j}Ei[j])/r]  and  (JT  ^  .  wc[i\wi[j])/r],  to  combine  this  simplification  with  the  division  mentioned 
above.  The  final  term  is  ztwc[i\wj\j\PR,  where  the  sign  depends  on  the  field  level  and  the  current  state  of  the 
material.  This  must  still  be  done  in  iteration.  However,  we  may  pre-multiply  either  wc  or  wi  by  Pr,  and  remove 
a  multiplication  from  the  loop.  Further,  the  multiplication  by  wc[i\  need  not  be  performed  for  every  mesh  point, 
but  only  once  per  row  (only  when  i  changes).  This  requires  accumulating  the  w / [j]  terms  into  a  temporary 
register,  but  this  action  is  typically  much  more  efficient  than  a  multiplication.  Incorporating  these  optimizations 
into  the  negligible  relaxation  model  yields  Algorithm  3.  While  some  of  these  changes  would  be  done  anyway 
by  an  optimizing  compiler  (exactly  which  ones  depends  on  the  compiler),  most  compilers  would  not  perform  all 
these  steps,  and  making  these  changes  can  give  a  noticeable  improvement  in  computational  efficiency.  We  found 
that  Algorithm  3  runs  almost  twice  as  quickly  as  Algorithm  1  when  both  were  compiled  under  the  gcc  compiler 
with  optimizations  enabled.  More  performance  comparisons  may  be  found  in  Section  4. 


#  Inputs:  E,  Ec,  wc,  Nc,  Ei,  wi,  Nr,  PR,  77,  8 

#  Output:  P 

#  Initial  Setup  -  not  input  field  dependent 
addit  =  0 

UJSum  —  0 

For  i  =  0...  Nc  —  1 
For  j  =  0  . . .  Ni  —  1 
addit  =  addit  +  wc[i]wi\j]Ei[j] 

UJsum  —  UJsum  T  UJc  [f]  W [  [j] 

End  For 
End  For 
addit  =  addit /  r) 

Wsum  —  U) sum  1 8] 

For  j  =  0  .  . .  Nj  —  1 
Wl[j\  =  Wl\j]PR 
End  For 

#  Begin  Iteration 


For  k  =  0  . . .  length(E)  —  1 
dE  =  E[k\  -  E[0] 

P[k]  =  addit  +  wSUmdE 
For  i  =  0  ...  Nc  —  1 
out  =  0 

For  j  =  0  . . .  Ni  —  1 
H  EI[j}  +  dE  +  Ec{i]8[i,j]>0 
S[i,j }  =  1; 

out  =  out  +  Wl[j] 

Else 

6[i,j]  =  -1 

out  =  out  —  Wl[j] 

End  If 
End  For 

P[k]  ==:  P[k]  +  wc\i\  out 
End  For 
End  For 


Algorithm  3.  Implementation  algorithm  for  the  Homogenized  energy  model  with  negligible  thermal  relaxation  -  repeated 
operations  removed. 
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To  remove  repeated  operations  from  the  relaxation  model,  we  do  not  calculate  P/,  Pmin,  and  P%n  directly, 
as  in  Algorithm  2.  This  change  effects  the  equations  for  p _ p_j _ ,  P_,  and  P+.  For  example, 

=  erfc((-Pmm  +  PM  ~  e)//3)  -  erfc((Pmm  +  PM  +  e) //?) 
rerf  c((P“in  +  P/[z]  —  e)/f3)  +  f 

=  l/,_  erf  c((P[fe]  +  Pj[j]  -  Pc[z])/(7?/3)  +  e//3)  \ 

T  V  erf  c((F[fc]  +  P/[j]  -  Ec[i\)/(r]P)  -  e//3)  +  f  )  ' 

where  the  fact  that  /  is  essentially  0  has  been  used  to  remove  an  erf  c  calculation.  We  now  see  that  division  by 
and  f3  can  be  done  in  advance  (for  all  terms  except  E[k])  or  once  per  temporal  iteration  (for  E[k\).  The  same 
holds  for  p. | Further,  let 


P_  =  P__  +  P~.  =  P_ 

1  min 


m+EM] 


- Pr , 


P_  =  - 


Pexp(~((Ec[i]  -  Ej\j]  -  E[k\)/(rjf3)  -  e//3)2) 
x/7rerfc((-£'c[z]  +P/[j]  +  E[k])/{r}(3)  +e//3)  +/’ 


(13) 


P+  ~  P+  +  Pmin  —  P+ 


E[k]+ET\j] 


Pb 


P+  = 


/3exp(—  ((— Ec[i]  -  Pj[j]  -  E[k})/{r]P)  +  e//?)2) 
^/vrerf  c((-Ec[i]  -  ET[j]  -  E[k])/(r}(3)  +  e/(3)  +  / ' 


(14) 


This  allows  rj  and  f3  to  be  divided  through  in  advance  for  these  equations,  and  simplifies  the  polarization  relation 
to 


P[k]  =  P[k ]  +  wc[i]wI\j\(x+[i,j]P+  +  (1  -  x+[i,j])P-) 
=  P[k]  +  wc[i]w![j]  (x+[i,  j)  ( 


P+  -  P-  +  2 PR  )  +  P_  +  -  PR 


E[k] 


(15) 


This  in  turn  allows  the  wc[i\wi[j}(Ei[j}/r]— Pr)  and  w c[i\w i[j\E[k\ / p  to  be  calculated  in  advance,  as  was  done  with 
the  negligible  relaxation  model.  Incorporating  these  optimizations  yields  Algorithm  4.  Numerical  experiments 
demonstrate  that  Algorithm  4  runs  about  15%  to  20%  faster  than  Algorithm  2  (again,  more  detailed  results  can 
be  seen  in  Section  4) .  However,  more  significant  then  the  speed  increase  given  by  Algorithm  4  is  the  framework  it 
establishes  for  the  more  significant  optimizations  given  in  the  next  section.  It  should  be  noted  that  Algorithms  3 
and  4  are  equivalent  (within  numerical  limits)  to  Algorithms  1  and  2,  respectively;  we  have  not  lost  any  accuracy 
by  making  these  changes. 


3.  CALCULATION  OF  LIKELIHOODS  AND  AVERAGE  POLARIZATIONS 

Whereas  some  improvement  in  the  relaxation  algorithm  was  gained  in  the  previous  section,  Algorithm  4  is  still 
many  times  slower  than  Algorithm  3.  Much  of  this  time  is  spent  in  calculating  the  erfc  and  exp  functions, 

which  occur  in  the  equations  for  p p. | ,  P_,  and  P+.  Examining  the  equations  for  p j_  and  P_,  the  only 

non-constant  terms  are  E[k],  Ej ,  and  Ec ,  and  these  always  occur  together  in  the  same  pattern.  As  such,  we 
write  them  as  a  function  of  a  single  variable 

E^  =  E[k]+E!\j]-Ec[i\.  (16) 

This  gives  p _ and  P_  as  functions  of  a  single  variable,  which  are  calculated  as 

fi_  _  /3exp(-(-P^  -  e)2) 

\/2erf  c(P/i  +  e)  +  /’ 
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#  Inputs:  E,  Ec,  wc,  Nc,  Ei,  wi,  Ni,  PR,  r ;,  p,  r,  e,  f,  x+ 

#  Output:  P 

#  Initial  Setup  -  not  input  field  dependent 
e  =  e/p 

For  j  =  0  . . .  Ni  —  1 
ei\j]  =  Ei\j]/m 
End  For 
addit  =  0 
UJSum  —  0 

For  i  =  0...  Nc  —  1 
Ec[i\  =  Ec[i]/(rjp) 

For  j  =  0  . . .  Ni  —  1 
addit  =  addit  +  wc[i]wi\j]Ei\j\ 

W  sum  ==  Wsum  +  wc[i]wi[j] 

End  For 
End  For 

addit  =  P  addit  —  PrWsutu 

UJ  sum.  —  U)  sumP 

#  Begin  Iteration 

For  k  =  0  . . .  length(E)  —  1 
E[k]  =  E[k]/(Vp) 

P[k]  =  addit  +  wSumE[k\ 

For  *  =  0  . . .  Nc  —  1 
out  =  0 

For  j  =  0  . . .  Ni  —  1 


tmp  =  erfc(E[fc]  +  Ei\j ]  —  Ec[i]  +  e) 

If  Ec[i]  -  E[k\  -  E![j]  >  0 

=  1  , , _ trrvp _ . 

P  +  r  erfc  (E[k\  +  Ei\j]  —  Ec[i]  —  e)  +  f 

Else 


P-+  = 


End  If 


P-  =  - 


Pexp(-(-E[k\  -  Ei[j\  +  Ec[{\  -  e)2) 


y/n  tmp  +  / 
tmp  =  erfc(— E[k\  —  Ei[j]  —  Ec[i]  +  e) 

If  Ec[i]  +  E[k\  +  Eft ]  >  0 

=  l(, _ tmp _ 

r  erfc  {—E[k\  —  Ei[j]  —  Ec[i\  —  t)  +  f 

Else 

1 

P+-  =  - 

T 

End  If 


P+  = 


Pexp(-(-E[k]  -  Eft]  -  Ec[i]  +  e)2) 
ypK  tmp  +  f 


[  ,  _  x+[j,j]+p-+At 

x+[t’jl  1  +  At(p+_  +  p-+) 


out  =  out  +  Wi\j\{x+[i,  j](P+  -  P-  +  2 Pr)  +  P-) 
End  For 

P[k]  =  P[k]  +  wc\i\  out 
End  For 
End  For 


Algorithm  4.  Implementation  of  the  homogenized  energy  algorithm  including  thermal  relaxation  -  repeated  operations 
removed. 


If  Elt  >  0 

1 

P-+  =  ~ 

T 

Else 


(18) 


erfc(EM  +  e)  \ 
erfc (E,j,  -e)  +  f)  ' 

This  fact  is  illustrated  in  Figure  1.  Notice  that  changing  Ec,  Ei,  or  E[k]  shifts  the  resulting  likelihood  function, 
but  does  not  otherwise  change  the  shape  of  the  function.  If  the  solutions  of  (17)  and  (18)  are  known  for  all 

E ^  G  IR,  then  the  loop  iterations  need  only  look  up  the  the  values  of  p _ and  P_ ,  rather  than  perform  the  complex 

calculations  on  each  iteration.  The  same  process  holds  for  p+_  and  P+ ,  except  that  E M  =  —E[k]  —  Eft]  —  Eft], 
Of  course,  it  is  impossible  to  calculate  and  store  in  memory  the  solution  of  (17)  or  (18)  for  all  E M  £  IR. 
However,  it  is  possible  to  bound  both  the  range  and  resolution  needed  for  E ll,  thereby  obtaining  a  vector  that 
can  be  calculated  and  stored  in  memory.  To  bound  the  range,  one  of  two  options  may  be  used.  First,  if  the  input 
field  level  is  bounded  by  an  arbitrary  number,  a  minimum  and  maximum  possible  E ^  may  be  calculated.  To 
do  this,  we  note  that  the  minimum  and  maximum  values  of  Ec  and  Ej  are  known  from  the  quadrature  points. 
The  minima  and  maxima  can  be  added  together  to  bound  E /i.  A  second  approach  to  bound  E M  is  to  recall  that 
the  distribution  for  Ec  is  bounded  by  a  decaying  exponential.  As  such,  for  E M  >  EC[NC  —  1] ,  we  may  assume 
the  distribution  is  close  enough  to  0  to  be  treated  as  such,  and  therefore  all  dipoles  are  oriented  in  the  positive 
direction.  Likewise,  for  values  of  E ^  <  — EC[NC  —  1],  all  dipoles  are  oriented  in  the  negative  direction.  When  all 
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(a)  (b) 

Figure  1.  (a)  Shift  caused  by  a  different  coercive  field  value  with  0  applied  field,  (b)  Shift  caused  by  different  applied 
fields  with  Ec  =  0. 


dipoles  line  up  uniformly,  the  material  acts  as  a  single  domain  and  behaves  in  a  linear  fashion  as  given  by  the 

kernel  (1).  Thus,  we  only  need  to  know  the  values  of  p _ p  and  P_  for  E M  £  [— EC[NC  —  1],  EC[NC  —  1]].  Values 

outside  this  range  will  simply  give  0  for  P_  and  P+  and  either  0  or  1/r  for  p _ |_  and  p- \ _ (depending  on  whether 

the  out  of  range  value  is  positive  or  negative).  The  first  approach  yields  a  slightly  simpler  final  model,  but 
places  an  additional  constraint  on  the  input  field  level.  The  second  approach  requires  no  additional  constraints, 
but  does  require  checking  for  out-of-bounds  values  (which  can  never  occur  in  the  first  approach).  To  maintain 
generality,  the  second  approach  is  employed  here,  with  one  modification.  To  reduce  the  number  of  out-of-bounds 
calculations  (at  the  expense  of  memory  usage) ,  we  allow  E M  £  [— Ec  [iVc  —  1  ]—Ei  [ iVj  —  1  },EC  [iVc  —  1]  +  Ej  [Ni  —  1]] . 
As  mentioned  above,  all  the  points  just  added  to  the  range  of  Et,  will  give  repetitive  values  for  the  likelihoods 
and  average  polarizations.  However,  given  an  E[k ]  and  Ec[i\,  we  may  now  iterate  over  all  values  of  Ej ,  checking 
only  the  first  for  an  out  of  bounds  condition.  All  other  values  that  do  not  give  E M  £  [— EC[NC  —  1  ],EC[NC  —  1]] 
will  receive  the  same  out  of  bounds  values  as  before,  without  additional  conditional  logic  to  check  their  bounds. 

We  have  limited  the  range  of  A;J  that  must  be  considered,  but  that  by  itself  still  gives  infinitely  many  possible 
values.  The  resolution  of  E M  must  now  be  set.  This  problem  should  be  familiar  to  anyone  who  has  designed 
numerical  methods.  The  higher  the  resolution,  the  more  accuracy  is  obtained,  but  the  more  memory  is  needed. 
If  the  resolution  is  set  arbitrarily,  a  very  large  amount  of  memory  is  needed  to  give  accurate  results  (potentially 

many  tens  to  a  couple  hundred  thousand  points  or  more  for  each  of  p _ |_,  p. \ _ ,  P_,  and  P+).  This  is  unacceptably 

large  for  most  real-time  applications.  To  improve  this,  we  must  restrict  the  quadrature  method  over  E j  to  one 
of  the  Newton-Coates  formulas  (to  maintain  the  even  number  of  quadrature  points,  it  must  be  an  even  degree 
formula  such  as  trapezoid  or  Simpson’s  3/8  rule  over  an  odd  number  of  intervals).  This  restriction  yields  equally 
spaced  quadrature  points.  The  resolution  can  now  be  set  as  some  positive  integer  divisor  r  of  the  Ej  stepsize. 
Quantization  error  will  still  be  introduced  when  Ec  and  E[k\  are  not  zero;  however,  no  additional  quantization 
is  added  by  iterating  over  Ej.  This  reduces  the  amount  of  error  significantly.  While  the  exact  amount  of  error 
depends  on  the  parameters  being  used,  in  test  cases  we’ve  run  less  than  a  thousand  points  need  to  be  calculated 

and  stored  for  p _ p _ ,  P_,  and  P+  in  order  to  limit  the  increased  quantization  error  to  values  well  below 

those  introduced  by  data  collection  and  quadrature  method.  Two  such  test  cases  are  explored  in  Section  4. 

The  final  relaxation  algorithm,  including  the  optimizations  described  in  this  section,  is  given  in  Algorithm  5. 
A  few  things  should  be  noted.  No  exp  or  erf  c  functions  need  to  be  computed  in  real  time  -  all  can  be  calculated 
and  stored  in  advance.  In  addition,  the  number  of  additions,  subtractions,  multiplications  and  divisions  needed 
per  iteration  has  been  reduced.  Experiments  detailed  in  Section  4  indicate  this  algorithm  runs  as  little  as  1.6 
times  slower  than  the  optimized  negligable  relaxation  model  (Algorithm  3),  although  the  exact  difference  does 
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Else 


#  Inputs:  E,  Ec,  wc,  Nc,  Ei,  wi,  Ni,  Pr,  r),  /3,  r,  e, 

#  f,  x+,  r 

#  Output:  P 

#  Initial  Setup  -  not  input  field  dependent 

e  =  e//3 
addit  =  0 

UJSum  —  0 

For  i  =  0  . . .  Nc  —  1 
For  j  =  0  . . .  Ni  —  1 
addit  =  addit  +  wc[i]wi\j]Ei\j\ 

W  sum  ==  W  sum  +  wc[i\wi[j] 

End  For 
End  For 

addit  =  addit /eta  —  PRWSUm 

VJsum  —  W  sum  l  eta 

esteP  =  (-Er[l]  —  -E/[0])/r 
Nr  =  Ni  r 

If  Ec[Nc  -  1]  >  Ei  [Ei  -  1] 

increase  =  cei\((Ec[Nc  —  1]  —  Ei[Ni  —  1  ])/estcV)  +  Nr 
Else 

increase  =  Nr  —  floor  ((I5j[IVj  —  1]  —  EC[NC  —  1  ])/esteP) 
End  If 

step  =  esteP/ (v/3) 

N p  =  2  increase  +  Nr 
eff  =  Ei[0]/(j]/3) 

For  j  —  0  .  . .  Np  —  1 
Ep  =  eff  —  step  increase  +  step  j 
tmp-  =  erf  c  (Ep  +  e) 
tmp+  =  erf  c(—Ep  +  t ) 

If  Ep  >  0 
p-+\j]  =  i/T 

P+-\j]  =  {1  -  tmp+/(exfc(-Ep  -  e)  +  f))/r 


P-  +  IJ]  =  (1  -  tmp- j  (erfc (Ep  -  e)  +  f))/r 

P+-\j]  =  l/T 
End  If 

P-  [i]  =  ~/3exp (-(Ep  +  e)2)/(Vn(tmp-  +  /)) 

P+\j]  =  /3exp{-{~Ep  +  e)2)/(0r  (tmp+  +  /)) 

End  For 

#  Begin  Iteration 
For  k  =  0  . . .  lengtli(E) 

P[k]  =  addit  +  wSumE[k] 

kshift  —  E[k[/estep 

For  i  =  0  . . .  Nc  —  1 
i  shift  =  Ec[i[/estep 

j-  =  round  (increase  +  kshift  —  ishift ) 

If  j-  <  0  Then  j-  =  0  End  If 

If  j -  +  Nr  >  Np  Then  j-  =  Np  —  Nr  —  1  End  If 

j+  =  round  (increase  +  kshift  +  ishift) 

If  j+  <  0  Then  j+  =  0  End  If 

If  j+  +  Nr  >  Np  Then  j+  =  Np  -  Nr  -  1  End  If 

out  =  0 

For  j  =  0  . . .  Ni  —  1 
j-  =j-+r 
3+  =j++r 

f.  =  x+[i,j}+p-+[j-\  At 

+  1  +  A  t(p-+\j-]+p+-\j+]) 

out  =  out  +  wi[j](x+[i,j](P+\j+\  -  P-\j~]  +  2 PR)+ 

P-  b'-D 

End  For 

P[k]  =  P[k]  +  wc[i }  out 
End  For 
End  For 


Algorithm  5.  Improved  algorithm  for  the  homogenized  energy  model  with  thermal  relaxation. 

depend  on  the  parameters  input  and  the  compiler/hardware  being  used.  Regardless,  this  is  a  vast  improvement 
over  the  100  times  slower  execution  exhibited  by  Algorithm  4.  It  should  also  be  noted  that,  while  the  Ef 
quadrature  has  been  constrained  to  a  Newton-Coates  formula,  Ec  has  not  had  any  constraints  placed  on  it, 
other  than  those  mentioned  in  Section  1.  Finally,  note  that  the  choice  of  constraining  and  working  along  Ei  is 
arbitrary.  An  analogous  model  could  be  developed  with  constraints  placed  on  Ec  instead. 


4.  PERFORMANCE  AND  ERROR  COMPARISONS 

The  exact  performance  of  Algorithm  5  depends  on  the  hardware  and  model  parameters  in  use.  Table  2  provides  a 
performance  comparison  for  an  example  ferroelectric  material  (from  [3]).  All  five  algorithms  were  tested  with  the 
same  input  of  approximately  15000  field  points,  and  used  Simpson’s  3/8  rule  on  21  quadrature  intervals  in  both 
the  Ec  and  Ei  distributions  (64  quadrature  points).  Models  were  compiled  and  run  on  two  different  platforms 
to  compare  the  effects  of  different  architectures.  All  algorithms  were  implemented  as  c  language  MATLAB  mex 


Linux 

seconds  relative  time 

seconds 

Sun 

relative  time 

Negligible  relaxation 

Original  (Algorithm  1) 

2.2 

1.8 

13.1 

1.6 

Optimized  (Algorithm  3) 

1.3 

1 

8.3 

1 

Thermal  relaxation 

Original  (Algorithm  2) 

153.1 

122.5 

318.3 

38.4 

Intermediate  (Algorithm  4) 

127.5 

102.0 

239.3 

28.9 

Optimized  (Algorithm  4) 

2.1 

1.7 

13.1 

1.6 

Table  2.  Execution  time  of  five  algorithms  presented  in  this  paper.  Times  are  given  for  two  different  architectures. 


files,  and  run  from  that  environment.  The  first  test  machine  used  was  a  1.7  GHz  Pentium  IV  Xeon  running  Red 
Hat  Enterprise  Linux  Workstation  3.  Code  on  this  machine  was  compiled  with  gcc  3.2.3.  The  other  test  machine 
was  a  Sunblade  100  with  a  500  MHz  Sparc  processor.  The  Sun  Workshop  Compiler  version  5.0  was  used  on  this 
machine.  For  simplicity,  these  machines  are  referred  to  as  linux  and  sun  in  the  table. 

Table  2  makes  it  clear  that  Algorithm  5  is  far  superior  computationally  to  a  direct  implementation  of  the 
homogenized  energy  model  with  thermal  relaxation,  as  given  in  Algorithm  2.  The  test  detailed  in  the  table  gives 
almost  a  25  times  improvement  on  the  RISC-based  SPARC  processor,  and  over  a  70  times  improvement  on  the 
more  CISC-designed  Pentium  IV.  Tests  with  other  parameters  have  yielded  slightly  less  dramatic  differences,  but 
still  give  marked  improvement.  For  example,  the  same  Pentium  IV  machine  gave  a  25  times  improvement  for 
the  computation  speed  of  Algorithm  5  when  run  on  an  example  ferromagnetic  material  (from  [5]).  The  reason 
for  this  difference  is  believed  to  be  related  to  the  internal  computer  architecture,  but  is  not  yet  fully  understood. 

As  noted  before,  the  improved  run  time  of  Algorithm  5  is  not  free;  increased  quantization  error  has  been  in¬ 
troduced  into  the  polarization/magnetization  calculations.  However,  this  quantization  error  is  quite  manageable. 
Figure  2  illustrates  the  input,  output,  and  difference  between  the  original  relaxation  algorithm  (Algorithm  2)  and 
the  optimized  one  presented  in  this  paper  (Algorithm  5)  for  r  =  10,  and  Table  3  gives  the  maximum  and  root 
mean  square  (RMS)  error  for  r  values  between  1  and  64.  Notice  that  in  all  cases,  the  maximum  error  introduced 
by  the  optimizations  is  less  than  0.6%  (1  part  in  167)  of  the  saturation  polarization  or  magnetization.  Further, 
note  that  error  is  inversely  proportional  to  r.  Thus,  error  can  be  roughly  halved  by  doubling  r.  This  should  allow 
the  faster  algorithm  to  perform  to  almost  any  desired  level  of  accuracy  (when  compared  to  the  original  model). 
A  larger  r  has  very  little  effect  on  the  computational  speed  of  the  algorithm,  as  it  only  requires  accesses  of  larger 

constant  arrays  ( p _ |_,  p _| _ ,  P_,  and  P+).  No  significant  computation  difference  was  seen  for  r  values  of  1  to 

16  in  the  ferroelectric  example,  and  no  significant  difference  was  seen  for  any  r  in  Table  3  for  the  ferromagnetic 
example.  However,  the  size  of  likelihood  and  average  polarization/magnetization  arrays  is  proportional  to  r,  so 
more  memory  must  be  utilized  for  finer  resolutions,  but  is  still  fairly  small  for  moderate  r.  With  r  =  1,  only  502 


r 

Ferroelectric  Example 

max  error  RMS  error 

Ferromagnetic  Example 

max  error  RMS  error 

i 

4.42  x  10"4 

0.13% 

1.24  x  10"4 

0.035% 

2007 

0.44% 

994 

0.22% 

2 

9.97  x  10"4 

0.28% 

1.97  x  10"4 

0.056% 

2668 

0.58% 

1111 

0.24% 

4 

4.74  x  10"4 

0.14% 

9.15  x  10"5 

0.028% 

1147 

0.25% 

502 

0.11% 

8 

2.64  x  10"4 

0.075% 

4.72  x  10"5 

0.013% 

561 

0.12% 

244 

0.053% 

10 

1.95  x  10"4 

0.056% 

3.73  x  10-5 

0.011% 

432 

0.094% 

194 

0.042% 

16 

1.47  x  10"4 

0.042% 

2.31  x  10"5 

0.0066% 

320 

0.070% 

121 

0.026% 

32 

9.30  x  10-5 

0.027% 

1.27  x  10"5 

0.0036% 

182 

0.038% 

61.3 

0.013% 

64 

6.73  x  10-5 

0.019% 

7.78  x  10-6 

0.0022% 

136 

0.030% 

33.0 

0.0072% 

Table  3.  Error  introduced  by  Algorithm  5,  when  compared  to  the  original  Algorithm  2.  Percentages  are  of  saturation 
polarization/magnetization. 
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error  between  algorithm  2  and  5  (C/m2)  polarization  (C/m2)  electric  field  (MV/m) 


time  (s) 

(a)  input  electric  field 


20  12345678 

time  (s) 

(c)  increased  quantization  error  with  r=10 


(d)  input  magnetic  field 


(e)  magnetization  hysteresis  loops 


(f)  increased  quantization  error  with  r=10 


Figure  2.  Numerical  simulations  showing  accuracy  of  thermal  relaxation  algorithm  presented  in  this  paper  versus  the 
standard  algorithm  from  the  literature.  Note  difference  in  scales  between  results  plot  and  error  plot. 
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values  need  to  be  stored  for  the  ferroelectric  example  (64  quadrature  points  for  each  of  Ec  and  Ej).  At  r  =  10, 
this  value  is  up  to  5006  values  for  each  of  the  four  arrays.  The  ferromagnetic  example  benefits  from  a  maximum 
Ec  less  than  the  maximum  Ej  and  needs  only  152  values  for  r  =  1  and  1514  values  for  r  =  10  (again  with  64 
quadrature  points  for  each  of  Ec  and  Ej).  This  suggests  that  memory  (and  in  some  cases  speed  as  well)  may 
be  optimized  in  the  ferroelectric  example  by  optimizing  along  Ec  instead  of  Ej  (as  is  mentioned  at  the  end  of 
Section  3),  but  this  is  not  further  explored  here.  In  either  example,  this  level  of  memory  is  quite  obtainable  in 
most  modern  microprocessor  or  gate  array  devices. 

The  error  figures  were  obtained  by  comparing  the  same  quadrature  routines  in  both  Algorithms  2  and  5. 
While  this  provides  a  good  measure  for  the  quantization  error  introduced  by  the  changes  made  herein,  it  does 
tend  to  ignore  the  biggest  drawback  to  Algorithm  5:  the  restriction  to  Newton-Coates  quadrature  on  the  Ej 
distribution.  The  dominant  factor  influencing  error  is  often  the  quadrature  method  in  use.  This  is  especially 
true  around  zero  applied  field  where  the  accuracy  is  often  most  desired.  Care  in  choosing  a  quadrature  method 
is  a  general  numerical  necessity,  and  is  not  discussed  in  detail  here.  However,  we  note  that  the  speed  advantage 
of  Algorithm  5  allows  many  more  quadrature  intervals  to  be  considered  while  still  giving  a  faster  execution 
speed  than  Algorithm  2.  More  quadrature  intervals  do  require  more  memory,  and  applications  that  need  very 
high  accuracy  with  memory  constrains  in  the  low  kilobytes  may  need  to  stop  their  optimizations  at  Algorithm  4 
(which  uses  less  memory  than  Algorithm  2).  For  most  applications,  however,  the  accuracy  lost  in  Algorithm  5  by 
requiring  Newton-Coates  quadrature  may  be  recovered  by  increasing  the  number  of  quadrature  intervals,  while 
still  maintaining  a  faster  execution  speed  over  the  origin  relaxation  algorithm. 
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